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Square Root Law for Communication with Low 
Probability of Detection on AWGN Channels 

Boulat A. Bash, Dennis Goeckel, Don Towsley 

Abstract 

We present a square root limit on low probability of detection (LPD) communication over additive 
white Gaussian noise (AWGN) channels. Specifically, if a warden has an AWGN channel to the 
transmitter with non-zero noise power, we prove that o{y/n) bits can be sent from the transmitter 
to the receiver in n AWGN channel uses with probability of detection by the warden less than e for 
any e > 0. Moreover, in most practical scenarios, a lower bound on the noise power on the warden's 
channel to the transmitter is known and 0{^/n) bits can be covertly sent in n channel uses. Conversely, 
attempting to transmit more than 0{y/n) bits either results in detection by the warden with probability 
one or a non-zero probability of decoding error as n oo. Further, we show that LPD communication 
on the AWGN channel allows one to send a non-zero symbol on every channel use, in contrast to what 
might be expected from the square root law found recently in image-based steganography. 

I. Introduction 

Securing information transmitted over wireless links is of paramount concern for consumer, 
industrial, and military applications. The taxonomy of network security classifies secure com- 
munication into two distinct categories: low probability of intercept (LPI) communication and 
low probability of detection (LPD) communication [IJ. In recent years, the wireless networking 
community has made tremendous strides in the former area, securing data transmitted in wireless 
networks from interception by an untrusted eavesdropper using various encryption and key 
exchange protocols. However, the latter area, LPD communication, which concerns the prevention 
of transmissions from being detected in the first place, has been relatively underexplored. 

Consider a node that is trying to send data on a wireless channel to another node so that the 
presence of this transmission is not detected by an eavesdropping third party. There are many 
real-life scenarios where this is preferable to standard cryptographic security. Encrypted data 
arouses suspicion, and even the most theoretically robust encryption can often be defeated by 
a determined adversary using non-computational methods such as side-channel analysis. Thus, 
the study of covert communications over LPD channels is extremely important. 

We examine the fundamental limitations of covert communication over wireless links. In our 
system Alice transmits covert data over a wireless channel to Bob, while passive eavesdropper 
Warden Willie attempts to decide whether what he hears is noise or covert communication. 
Willie is passive in that he does not actively jam Alice's channel. However, if he detects covert 
communication, he can potentially shut the channel down or otherwise punish Alice. Such a 
scenario requires Alice to blend in with the background noise by softly "whispering" to Bob, 
as sending a high-power signal will surely tip Willie off to the covert transmission. 
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Our problem is related to the problem of imperfect steganography, but the two problems are not 
the same. Standard steganography considers hiding information by altering the properties of fixed- 
size, finite- alphabet covertext objects (such as images or software binary code) with imperfect 
steganography systems allowing some fixed probability of detection of hidden information. 
Covertext can be considered a type of lossless finite-alphabet channel. However, the square root 
law recently found in this environment [2J, which states that 0{y/n) symbols in the covertext 
of size n may safely be modified to hide a steganographic message, is limited in its scope. The 
continuous-valued channel allows us to spread hidden information across every symbol used in 
the transmission, thus showing that a direct application of the steganographic result quickly leads 
to contradiction and demonstrating the distinction between the two problems. In fact, our square 
root law can be viewed as generalizing the square root law for imperfect steganography. 

In our scenario, Alice communicates with Bob over a channel subject to additive white 
Gaussian noise (AWGN), while Willie attempts to detect her transmission. The channel between 
Willie and Alice is also subject to AWGN. Alice sends low-power covert signals to Bob that 
Willie attempts to classify as either noise on his channel from Alice or Alice's signals to Bob. If 
the noise on the channel between Willie and Alice has non-zero power, Alice can communicate 
with Bob while tolerating a certain probability of detection, which she can drive down by 
transmitting with low enough power. Thus, Alice potentially gets non-zero mutual information 
across the covert channel to Bob in n uses of the channel. We state our main result that limits 
mutual information on the covert channel between Alice and Bob using asymptotic notation 
where f{n) = 0{g{n)) denotes an asymptotically tight upper bound on f{n) (i.e. there exist 
positive constants m and such that < /(n) < mg{n) for all n > uq), f{n) = o{g{n)) 
denotes an upper bound on f{n) that is not asymptotically tight (i.e. for any constant m > 0, 
there exists constant no > such that < f{n) < mg{n) for all n > uq), and f{n) = u{g{n)) 
denotes a lower bound on f{n) that is not asymptotically tight (i.e. for any constant m > 0, 
there exists constant no > such that < mg{n) < f{n) for all n > uq) [[3l Ch. 3.1]: 

Theorem (Square root law). Suppose the channel between Alice and each of Bob and Willie 
experiences independent additive white Gaussian noise (AWGN) with power cr^ > and a^^ > 0, 
respectively, where al and are constants. Then, for any e > and unknown cr^, Alice can send 
o{\/n) information bits to Bob in n channel uses while maintaining a probability of detection of 
Alice's transmission by Willie of less than e. Moreover, if Alice can lower-bound a^, > cr^, she 
can send 0{y/n) bits in n channel uses while maintaining a probability of detection of less than 
e. Conversely, if Alice attempts to transmit uJ{^/n) bits in n channel uses, then, as n ^ oo, either 
Willie detects her with arbitrary low probability of error or Bob cannot decode her message 
reliably (i.e. with arbitrary low probability of decoding error). 

After introducing our discrete-time channel model and hypothesis testing background in 
Section |Ill we prove the achievability of the square root law in Section Hill We then prove 
the converse in Section |Wl We discuss the mapping to the continuous-time channel and the 
implications of channel fading on our results, as well as the relationship to previous work in 
Section |Vl and conclude in Section |Vll 

II. Prerequisites 

A. Channel Model 

We use the discrete-time AWGN channel model with real-valued symbols (and defer discussion 
of the mapping to a continuous-time channel as well as a fading channel to Section jV]). Our 
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Fig. 1. System framework: Alice encodes information into a vector of real symbols f = {fi}"^^ and transmits it on an AWGN 
channel to Bob, while Willie attempts to classify his vector of observations of the channel from Alice as either an AWGN 
vector Zu, — | or a vector |/i + z^^^ | of transmissions corrupted by AWGN. 



fonnal system framework is depicted in Figure [IJ In our scenario, Alice transmits a vector of 
n real- valued symbols f = {fi}^^i- Bob receives vector = s yf'^ \ where yf''' = fi + zf' 

with an independent and identically distributed (i.i.d.) z\ ~ A/'(0,(7^). Willie observes vector 
yw = jz/i"'''!. where y^^^ = fi + zl^\ with i.i.d. z^^^ ~ A/'(0,cr^). Willie uses statistical 
hypothesis tests on to determine whether Alice has communicated, which we discuss next. 

B. Hypothesis Testing 

Willie expects vector y^^, of n channel readings to be consistent with his channel noise model. 
He performs a statistical hypothesis test on this vector, with the null hypothesis Hq being that 
Alice is not covertly communicating. This corresponds to each sample y^^^ ~ A/^(0, o"^) i.i.d. 
The alternate hypothesis Hi is that Alice is transmitting, which corresponds to samples y^^^ 
from a different distribution. Willie tolerates some false positives: cases when his statistical test 
incorrectly accuses Alice. This rejection of Hq when it is true is known as the type I error 
(or false alarm), and, following the standard nomenclature, we denote its probability by a [4]. 
Willie's test may also miss Alice's covert transmissions. Acceptance of Hq when it is false 
is known as the type II error (or miss), and we denote its probability by /3. The sum a + (3 
determines the performance of a hypothesis test Q. 



III. ACHIEVABILITY OF SQUARE ROOT LAW 

In our scenario, Alice and Bob construct a covert communications system, with all the details 
known to Willie except for a secret key shared before communication. This follows "best 
practices" in security system design, as our system obeys Kerckhoffs's law [5J because its security 
depends only on the key [6J. Since this work concerns the limits of covert communication, key 
size is not a constraint and we defer the study of key efficiency to future work. 

Willie's objective is to determine whether Alice transmitted covert data given the vector 
of observations y^, of his channel from Alice. Denote the probability distribution of Willie's 
channel observations when Alice does not transmit (i.e. when Hq is true) as Pq, and the 
probability distribution of the observations when Alice transmits (i.e. when Hi is true) as Pi. To 
strengthen the achievability result, we assume that Alice's channel input distribution, as well as 
the distribution of AWGN on the channel between Alice and Willie are known to Willie. Then 
Po and Pi are known to Willie, and he can construct an optimal statistical hypothesis test that 
minimizes the sum of error probabilities a + /9 flU Ch. 13]. The following holds for such a test: 
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Fact 1 (Theorem 13.1.1 in iHl). For the optimal test: 

a + 13 = l-Ty(Po,Pi) 

where TV{Vq^ Pi) is the total variation distance between Pq and Pi defined as follows: 

Definition 1 (Total variation distance flU). The total variation distance between two probability 
measures Pq and Pi on a a-algebra A is 

T\/(Po,Pi) = sup{|Po(A) - Pi(A)| ■.AeA} = ^lbo(x) -pi(x)||i (1) 

where Po{x) and pi{x) are densitie^ of Pq and Pi, respectively, and \\a — b\\i is the norm. 

Since total variation lower-bounds the error of all hypothesis tests Willie can use, a clever 
choice of f allows Alice to limit Willie's detector performance. Unfortunately, the total variation 
metric is unwieldy for the products of probability measures, which are used in the analysis of 
the vectors of observations. We thus use Pinsker's Inequality: 

Fact 2 (Pinsker's Inequality (Lemma 11.6.1 in {7])). 

2 IboW -Piix)\\l = -U \poix)-piix)\dx\ < /^(PollPi) 

where relative entropy Z}(Po||Pi) is defined as follows: 
Definition 2. The relative entropy between two probability measures Pq and Pi is: 

D(Po||Pi) = / po{x)\n^dx (2) 

where X is the support of pi{x). 

If P" is the distribution of a sequence {Xj}"^^ where each Xj ~ P is i.i.d., then: 
Fact 3 (Relative Entropy Product). From the chain rule for relative entropy /[Ti (2.67)]: 

D{P'^\\P'^,) = nD(Po||Pi) 

Now we are ready to prove the achievability theorem under an average power constraint. 

Theorem 1.1 (Achievability). Suppose Willie's channel is subject to AWGN with constant power 
cr^j > 0. Then Alice can maintain Willie's sum of the probabilities of detection errors a+f3 > 1— e 
for any e > while covertly transmitting o{^/n) bits to Bob over n uses of an AWGN channel 
if cr^j is unknown and 0{y/n) bits over n channel uses if she can lower-bound cr^ > a^. 

Proof: Construction: Alice's channel encoder takes input in blocks of size M bits and 
encodes them into codewords of length n at the rate of i? = M/n bits/symbol. We employ 
random coding arguments and independently generate 2"^ codewords {c(Wk), k = 1,2, . . . , 2"^} 
from M" for messages Wk, each according to px(x) = YYi=iPxixi), where X ~ Af{0,Pf) and 
Pf is defined later. The codebook is the secret key shared between Alice and Bob, and is not 
revealed to Willie. However, Willie knows how it is constructed, including the value of Pf. 

'in case of discrete Pq and Pi, 'Po{x) and pi{x) are p.m.f.'s. In our work, Pq and Pi are continuous. 
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The channel between Alice and Willie is corrupted by AWGN with power a^. Willie uses 
statistical hypothesis testing on a vector of n channel readings to decide whether Alice 
transmitted. Next we show how Alice can limit the performance of Willie's methods. 

Analysis: Consider the case when Alice transmits codeword c{Wk). Suppose that Willie 
employs a detector that implements an optimal hypothesis test on his n channel readings. His 
null hypothesis Hq is that Alice did not transmit and he observed noise on his channel. His 
alternate hypothesis Hi is that Alice transmitted and he observed Alice's codeword corrupted 
by noise. By Fact [H the sum of the probabilities of Willie's detector's errors is expressed by 
a + (3 = 1 — TV{Po, Pi), where the total variation distance is between the distribution Pq of n 
noise readings that Willie expects to observe under his null hypothesis and the distribution Pi 
of the covert codeword transmitted by Alice corrupted by noise. Alice can lower-bound the sum 
of the error probabilities by upper-bounding the total variation distance: TV(Po, Pi) < e. 

The realizations of noise z^:^^ in vector Zy, are zero-mean i.i.d. Gaussian random variables with 
variance cr^, and, thus, Pq = P^ where P^^, = M{0, a^). Recall that Willie does not know the 
codebook. Therefore, Willie's probability distribution of the transmitted symbols is of zero-mean 
i.i.d. Gaussian random variables with variance Pf. Since noise is independent of the transmitted 



symbols, when Alice transmits, Willie observes vector y^, where yl^' ~ A/'(0, Pf + a. 
is i.i.d., and thus. Pi = P". By Pinsker's Inequality and Fact [31 



tv{pi,p:) < 

The relative entropy follows as: 
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While the expression for D(P^||Ps) has a closed form, the Taylor series expansion of 
D(Pu,\\Ps) with respect to Pf around = is more useful. While the zeroth and first order 
terms are zero, the second order term is: 



p2 

2! 



X 



d'DiPjPs] 



dP] 



Pf=0 



p2 

in 



For the third order term we obtain 
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If Pf < (7^, then the Taylor series converges and we can apply Taylor's Theorem to upper-bound 
relative entropy with the second order term. The upper bound we seek is: 

2al\ 2 



tv{pi,p: 



< 



(3) 



Suppose Alice sets her average covert symbol power Pf < where c = 2e\/2. In most 

practical scenarios Alice can lower-bound cr^ > and set f{n) = (a conservative lower 
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bound is the thermal noise power of the best receiver currently available). If o"^ is unknown, 
select f{n) such that f(n) = o(l) and f(n) = uj{l/y/n) (the latter condition is used to bound 
Bob's decoding error probability). In either case, for n large enough, Pj < satisfies the Taylor 
series convergence criterion, and Alice obtains the upper bound TV(P^,P^) < e, limiting the 
performance of Willie's detector. 

Since Alice's symbol power Pj is a decreasing function of the codeword length n, the standard 
channel coding results for constant power do not directly apply. Thus, we examine the probability 
Pe of Bob's decoding error averaged over all possible codebooks. Let Bob employ a maximum- 
likelihood (ML) decoder (i.e. minimum distance) to process the received vector when c(Wk) 
was sent. The decoder makes an error when is closer to another codeword c{Wi), i ^ k: 



E 



< E 
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(5) 



where (H]) follows from the union bound. Let d = \\c{Wk) — c(W^j)||2 be the distance between 
two codewords, where || ■ ||2 is the norm. Since codewords are independent and Gaussian, 
c{Wk) - c{Wi) ~ A/'(0, 2Pf) and c/^ = 2PfU, where U ~ with xl denoting the chi-squared 
distribution with n degrees of freedom. Therefore, by |[8l (3.44)]: 
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where Q{x) = e-''/^dt. Since Q{x) < |e-^'/2 ^ (5)] ^nd Pj 
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where ^ is from the substitution v 
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Hence, the summand in ^ does not depend on i, and © becomes: 
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Since f{n) = uj{l/y/n), if rate R = f logj ^1 + 2^}^ j ^ constant p < 1, as n in- 
creases, the probability of Bob's decoding error decays exponentially to zero and Bob obtains 

nR = log2 ^1 + 2^a^ ) covert bits in n channel uses. Since nR < ^^-i^^^ , approaching 

equality as n gets very large, Bob receives o{y/n) bits in n channel uses, and 0{^) bits in n 
channel uses if f(n) = a'^. ■ 

Implications of a peak power constraint 

Since most practical systems are peak-power constrained, we show that the square root law 
holds for the binary input Gaussian output channel using a proof similar to that of Theorem 11.11 

Theorem 1.2 (Achievability under a peak power constraint). Suppose Alice's transmitter is 
subject to the peak power constraint h and Willie's channel is subject to AWGN with power 
0"^ > 0. Then Alice can maintain Willie's sum of the probabilities of detection errors a+P > 1— e 
for any e > while covertly transmitting o{y/n) bits to Bob over n uses of an AWGN channel 
if cr"^ is unknown and 0{^/n) bits in n channel uses if she can lower-bound > o"^. 

Proof: Construction: Alice encodes the input in blocks of size M bits into codewords of 
length n at the rate R = M/n bits/symbol with the symbols drawn from alphabet {—a, a}, 
where a satisfies the peak power constraint a? < h and is defined later. We independently 
generate 2"^ codewords {c{Wk)-, k = 1,2, . . . , 2"^} for messages Wk from {—a, a}" according 
to px(x) = YYi=iPx{xi), where pxi—o) = Px{ci) = \- As in the proof of Theorem 11.11 the 
codebook is a secret key shared between Alice and Bob, but Willie knows how it is constructed, 
including the value of a. 

Analysis: When Alice transmits a covert symbol during the i"^ symbol period, she transmits —a 
or a equiprobably by construction and Willie observes the covert symbol corrupted by AWGN. 
Therefore, Ps = \ (A/'(-a, crl) + J\f{a, )), and, with = A/'(0, al), we have: 

DfP,„||P.)=/ ^ In . , ^ ; — —dx 

+ e 



There is no closed-form expression for D(P^||Ps), but it can be expanded using the Taylor 
series with respect to a around a = 0. While the zeroth through third order terms are zero, the 
fourth order term is: 
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While the fifth order term is zero, for the sixth order term we obtain: 
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If a < aw, then the Taylor series converges and we can apply Taylor's Theorem and upper-bound 
relative entropy with the fourth order term. The upper bound we seek is: 

tv(p:.p;) < (7) 
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Since the power of Alice's covert symbol is = Pf, dV]) is identical to ([3]) and Alice sets < 
where c and f{n) are defined as in Theorem [TTTJ Then, for n large enough, a < cx^ satisfies 
the Taylor series convergence criterion, and Alice obtains the upper bound TV{P^,P'^) < e, 
limiting the performance of Willie's detector. 

Like in Theorem ll.li we cannot directly apply the standard constant-power channel coding 
results to our system where the symbol power is a decreasing function of the codeword length. 
We upper-bound Bob's decoding error probability by analyzing a suboptimal decoding scheme. 
Suppose Bob uses a hard-decision device on each received covert symbol yf'^ = fi + zf'^ via the 
rule /j = |a if yf'^ > 0; — a otherwise |, and applies an ML decoder on its output. The effective 
channel for the encoder/decoder pair is a binary symmetric channel with cross-over probability 
Pe = Q{a/ab) and the probability of the decoding error averaged over all possible codebooks is 
Pg < 2"^~"'(^~'^(P'=)) IfTOl , where 'H(p) = —p\og2P — {1 — p) log2(l — p) is the binary entropy 
function. We expand the analysis in IfTTl Section 1.2.1] to characterize the rate R. The Taylor 

series of e~* ' ^ alternates, and the Taylor series expansion of p^ = Q i-^j = ^ — Jq''' ^ dt with 
respect to a around a = (which converges since a is small for large n) yields an upper bound: 
< i — — = p^y^\ Since 'H(p) is a monotonically increasing function on the 

interval [O, |], 'H(pe) < 'H{pe^^^). The odd terms of the Taylor series expansion of 'H{p^e^^) 
with respect to a around a = are zero, and, thus, l-i{p^e^'') = 1 jfi^ + 0{a^). Since 

q2 _ £/W^ < 2" ~^^F^ \ Since f{n) = if rate R = bits/symbol 

for a constant p < 1, the probability of Bob's decoding error decays exponentially to zero as n 
increases and Bob obtains nR = o{y/n) bits in n channel uses, and 0{y/n) bits in n channel 
uses if f{n) = alj. ■ 

Remarks 

Relationship with the Gaussian wire-tap channel /[Z2]/-' Consider o"^ > cr^. From Theorems 
11.11 and II. 2[ Alice can send a positive number of bits covertly to Bob; however, the secrecy 
capacity of the Gaussian wire-tap channel [[T2ll is zero. This seems paradoxical until we consider 
that in the wire-tap scenario, Alice's objective is to prevent Willie from decoding her message 
to Bob. She fails when al > cr^ because Willie can decode any message she sends to Bob using 
public codebooks, as the capacity of Willie's channel to her is greater than Bob's. However, 
here Alice and Bob's codebook is private and Willie's ability to distinguish Alice's transmission 
from random noise is limited by the sum of the probabilities of his detection errors, which is 
controlled by Alice employing a constrained transmission power. Thus, provided they agree on 
a codebook beforehand, Alice can covertly communicate to Bob even when the channel from 
Alice to Willie is less noisy than the channel from Alice to Bob. 

Relationship with Square Root Law in Steganography: It has recently been shown that in finite- 
alphabet imperfect steganographic systems at most 0{^/n) symbols in the original covertext of 
size n may safely be modified to hide a steganographic message [2J. From the steganographic 
perspective, our covertext is the noise on Willie's channel to Alice. However, our result does not 
obey their converse, as we can modify all symbols in our covertext, highlighting the different 
nature of the problem scenarios. Nevertheless, it is worthwhile to consider a scenario where 
roughly rn out of n of symbols are used to carry the message. 

Let's construct the codebook in two stages. First, flip a biased coin n times and set the i"^ 
symbol in every codeword to one with probability r (and zero otherwise). Denote the number of 
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symbols set to one as r] and note that E [i]] = rn. We complete the codebook by independently 
generating 2"^ vectors of length 77 according to ]5x(x) = YYi=iPxixi), where X ~ J\f(0,Pf), 
and assigning the values of these vectors to the cells in the corresponding codewords that contain 
ones. Thus, a codeword uses rn symbols on average (over all codebooks), and all codewords 
contain zeros in identical locations, facilitating decoding for Bob. The coin flip is independent of 
both the symbol and the channel noise. When Alice is transmitting a codeword, the distribution 
of each of Willie's observations is = (1 - r)A/'(0, cr^) + rA/'(0, Pf + al) and, thus. 
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There is no closed-form expression for D(P^||Ps), but a Taylor series expansion with respect 
to Pf around Pf = yields the following bound: 

TV{PI,P:) < (8) 

The only difference in ([8]) from ([3]) is r in the numerator. Thus, if Alice sets the product 
rPf = ^4^^, with c and f(n) as previously defined, she limits the performance of Willie's 
detector. This product is the average symbol power used by Alice. It is easy to verify that in 
the peak power constrained scenario Alice should set product ra^ = ^^^^ and that the number 
of bits that Alice can covertly transmit to Bob obeys the previously derived square root bounds. 
This demonstrates the richness of our scenario and the generality of our square root law. 

IV. Converse 

Here, as in the achievability, the channel between Alice and Bob is subject to AWGN of 
power cr^. Alice's objective is to covertly transmit a message Wk that is M = UJ{^/n) bits long 
to Bob in n channel uses with arbitrarily small probability of decoding error as n gets large. 
Alice encodes each message Wk arbitrarily into n symbols at the rate R = M/n symbols/bit. 
For an upper bound on the reduction in entropy, the messages are chosen equiprobably. 

Willie observes all n of Alice's channel uses. To strengthen the converse, he is oblivious to 
her signal properties. Nevertheless, even with Willie's knowledge limited, Alice cannot transmit 
a message with uj{^/n) bits of information in n channel uses without either being detected by 
Willie or having Bob suffer a non-zero decoding error. 

Theorem 2. If over n channel uses, Alice attempts to transmit a covert message to Bob that is 
^iV^) bits long, then, as n ^ 00, either Willie can detect her with arbitrarily low sum of error 
probabilities a + f3, or Bob cannot decode with arbitrarily low probability of error. 

Proof: To detect Alice's covert transmissions, Willie performs the following hypothesis test: 

TT (W) (W) ■ -I 

Hi: yl ' = fi + zy, t = l,...,n 

Rejection of Hq means that Alice is covertly communicating with Bob. First, we show how 
Willie can bound the errors a and (3 of this test as a function of Alice's signal parameters. Then 
we show that if Alice prevents Willie's test from detecting her by adjusting her signal power. 
Bob cannot decode her transmissions without error. 
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To perfonn the test, Willie collects a vector of n independent readings from his channel to 

T 

Alice and generates the test statistic S = where denotes transpose of vector x. Under 
the null hypothesis Hq Alice does not transmit and Willie reads AWGN on his channel. Thus, 
y^^^ ~ A/'(0, (T^), and the mean and the variance of S when Hq is true are: 



E[5] = (9) 

var [5] = ^ (10) 

n 

Suppose Alice transmits codeword c(M4) = ■ Then Willie's vector of observations 

yw,k = 1 1/1'"'^^ I . contains readings of mean-shifted noise z/i^''^'^ ~ N {jl^\a1^. The mean 

of each squared observation is E [yf\ = cr^ + (^f-'^^^ , while the variance is var [yf] = E [yf] — 

(E [yf])'^ = 4 (^fi'^^^ crlj + 2cr^. Denote the average power per symbol of codeword c{Wk) by 
_ c(vyfc)c (Wk) ^ Then the mean and variance of S when Alice transmits message Wk are: 

E|S1 = al + Pt (11) 

n 

The variance of Willie's test statistic (fT2l) is computed by adding the variances conditioned on 
c{Wk) of the squared individual observations var [yf\ (and dividing by n^) since the noise on 
the individual observations is independent. 

The probability distribution for the vector of Willie's observations depends on which hypoth- 
esis is true. Denote Pq as the distribution when holds, and V\ when Hi holds with Alice 
transmitting message Wk- While P['^'' is conditioned on Alice's codeword, we show that the 
power of the codeword determines its detectability by this detector, and that our result applies 
to all codewords with power of the same order. 

If Hq is true, then S should be close to Willie picks some threshold t and compares the 
value of S* to (j^ + 1. He accepts Hq \i S < t and rejects it otherwise. Suppose that he 
desires false positive probability a*, which is the probability that 5 > cr^ + 1 when Hq is true. 
We bound it using © and with Chebyshev's Inequality E (3.32)]: 

a = PQ{S>al + t)<PQ{\S-al\>t)<-^ 

Thus, to obtain a*, Willie sets t = -i, where d = is a constant. As n increases, t decreases, 

V" Vet* 

which is consistent with Willie gaining greater confidence with more observations. 

Suppose Alice transmits Wk- Then the probability of a miss 13^''^ given t is the probability 
that S < + 1, which we bound using (fTTI) and (fT2l) with Chebyshev's Inequality: 

/3« = Pf ^ {S<al + t)< Pf ^ {\S-al-Pk\>Pk-t)< ^ffl + (13) 

If Pk = u{l/y/n), lim„^oo Z?^*'^ = 0. Thus, with enough observations, Willie can detect with 
arbitrarily low error probability Alice's codewords with average symbol power Pk = uj{l/y/n). 
Note that Willie's detector is oblivious to any details of Alice's codebook construction. 
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By (flBl) . if Alice desires to lower-bound the sum of the probabilities of error of Willie's 
statistical test hy a + (3 > ( > 0, she must use low-power codewords; in particular, a fraction 
7 > of the codewords must have Pu = 0{l/y/n). Let's denote this set of codewords as U 
and examine the probability of Bob's decoding error Pg. The probability that a message from 
set U is sent is P iJA) = 7, as all messages are equiprobable. We bound Pg = Pg (U) P (U) + 
Pg (U) P (U) > 7Pg (U), where U is the complement of U and Pg (U) is the probability of 
decoding error when a message from U is sent: 

Pe (W) = -^ Yl (^(^) transmitted) (14) 

where Pg {c{W) transmitted) is the probability of error when codeword c(W) is transmitted, 
I ■ I denotes the set cardinality operator, and (fT4l) holds because all messages are equiprobable. 

When Bob uses the optimal decoder, Pg {c{W) transmitted) is the probability that Bob decodes 
the received signal as W ^ W. This is the probability of a union of events Ej, where Ej is the 
event that sent message W is decoded as some other message Wj 7^ W: 

Pg (c(iy) transmitted) = P (upi,^^^.^,^.^,) > P {Uw,eu\{w}Ej) ^ Pf"^ (15) 

where the inequality in (fT5l) is since the sets in the second union are contained in the first. From 
the decoder perspective, this is due to the decrease in the decoding error probability is Bob knew 
that the message came from U (reducing the set of messages on which the decoder can err). 

Our analysis of pf^ uses Cover's simplification of Fano's inequality similar to the proof of 
the converse to the coding theorem for Gaussian channels in |7i Ch. 9.2]. Since we are interested 
in Pf\ we do not absorb it into e„ as done in (9.37) of |I71. Rather, we explicitly use: 

H{W\W) < 1 + (log2|W|)Pf) (16) 

where if(iy|iy) denotes the entropy of message W conditioned on Bob's decoding W of W. 

Noting that the size of the set U from which the messages are drawn is 72"^ and that, since 
each message is equiprobable, the entropy of a message W from U is H{W) = logg \V(\ = 
log2 7 + nR, we utilize (fT6l) and carry out steps (9.38)-(9.53) in [TJ to obtain: 

• - '^ + R 

n 

Since Alice transmits a;(\/n) bits in n channel uses, her rate is i? = uj{l/y/n) bits/symbol. 
However, Pu = 0{l/^/n), and, as n — > 00, Pf^^ is bounded away from zero. Since 7 > 0, Pg 
is bounded away from zero if Alice tries to beat Willie's simple hypothesis test. ■ 



Goodput of Alice's Communication 

Define the goodput G{n) of Alice's communication as the average number of bits that Bob can 
receive from Alice over n channel uses with non-zero probability of a message being undetected 
as n — )■ 00. Since only U contains such messages, by (flTI) . the probability of her message being 

successfully decoded by Bob is Pf '' = 1 — Pi"^ = O and the goodput is G{n) = 

'-)Pf^Rn = 0{^/n). Thus, Alice cannot break the square root law using an arbitrarily high 
transmission rate while keeping the power under Willie's detection threshold. 
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V. Discussion 

A. Mapping to Continuous -time Channel 

We employ a discrete-time model throughout the paper. However, whereas this is a common 
assumption made without loss of generality in standard communication theory, it is important to 
consider whether some aspect of the LPD problem has been missed by starting in discrete-time. 

Consider the standard communication system model, where Alice's (baseband) continuous- 
time waveform would be given in terms of her discrete-time transmitted sequence by: 

n 

x{t)=J2f^Pit-^Ts) 

1=1 

where is the symbol period and p(-) is the pulse shaping waveform. Consider a (baseband) 
system bandwidth constraint of W Hz. Now, if Alice chooses p{-) ideally as smc{2Wt), where 
sinc(x) = ^^^^[f^, then the natural choice of Tg = 1/2W results in no intersymbol interference 
(ISI). From the Nyquist sampling criterion, both Willie (and Bob) can extract all of the infor- 
mation from the signaling band by sampling at a rate of 2W samples/second, which then leads 
directly to the discrete-time model of Section |ll] and suits our demonstration of the fundamental 
limits to Alice's covert channel capabilities. However, when p(-) is chosen in a more practical 
fashion, for example, as a raised cosine pulse with some excess bandwidth, then sampling at a 
rate higher than 2W has utility for signal detection even if the Nyquist ISI criterion is satisfied. 
In particular, techniques involving cyclostationary detection are now applicable, and we consider 
such a scenario a promising area for future work. 

B. Fading and Shadowing 

Fading and shadowing will impact both the capacity of the channel from Alice to Bob and 
the ability for Willie to detect Alice's transmission. There are a number of different models 
that could be employed to incorporate these effects. However, while these models will have a 
significant impact as we move toward practical systems, they are unlikely to have an impact on 
the asymptotic results presented here. 

C. Relationship to Previous Work 

The LPD communication problem is related to the problem of establishing a cognitive radio 
(CR) network |fT3l . An aspect of the CR problem is limiting the interference from the secondary 
users' radios to the primary users of the network. The LPD problem with a passive warden can 
be cast within this framework by having primary users only listen [14J. However, the properties 
of the secondary signal that allows smooth operation of the primary network are very different 
from those of an undetectable signal. While there is a lot of work on the former topic, we are 
not aware of work by the CR community on the latter issue. 

Analytical evaluation of LPD communication has been sparse. Hero studies LPI/LPD channels 
|[T1 in a multiple-input multiple-output (MIMO) setting. However, he focuses on the constraints 
(s.t. power, fourth moment, etc.) that the LPD communication over a MIMO channel should 
enforce given the kind of information the adversary possesses and on the signaling methods that 
maximize the throughput of the channel given those constraints. While he recognizes that an 
LPD communication system is constrained by average power, he does not analyze the constraint 
asymptotically and, thus, does not obtain the square root law. It is notable that the LPI portion 
of his work has drawn significant attention, while the LPD portion has been largely overlooked. 
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Unlike LPD communication, much analytical work has been done on steganography. An 
excellent survey of work prior to 1999 is provided by Petitcolas [15|. A lot of the research effort 
in this area focuses on measuring the security of steganographic systems, which is particularly 
important for imperfect steganography as it allows the user to quantify the risk of being detected. 
Proposals for measures of security include relative entropy llT6l . Fisher information [17], as well 
as the metric of |T8l which is similar to the sum of Willie's detection errors that we employ. As 
noted in the remark in Section |llll the square root law was found in finite- alphabet imperfect 
steganography [2J. However, although their goal is the same as ours (hiding information with low 
probability of detection by Willie), their model based on hiding information in finite- alphabet 
images is very different from ours. As demonstrated by the constructions of Section |llll our 
scenario is arguably richer, and its additional degree of freedom in the choice of transmission 
power allows Alice to alter all n symbols used in transmission while maintaining a fixed detection 
probability, which stands in contrast to the finite- alphabet steganography result. 

VI. Conclusion 

Practitioners have always known that LPD communication requires one to use low power 
in order to blend in with the noise on the eavesdropping warden's channel. However, the 
specific requirements for achieving LPD communication and resulting achievable performance 
have seldom been analyzed prior to this work. We quantified the conditions for existence and 
maintenance of an LPD channel by proving that the LPD communication is subject to a square 
root law in that the number of bits that can be covertly transmitted in n channel uses is 0{^yn). 
An interesting result in our work is the fact that one can use all of the n symbols with positive 
power to transmit the covert messages. 

There are a number of avenues for future research. Practical network settings and the impli- 
cations of the square root law on the covert transmission of packets under additional constraints 
such as delay should be analyzed. The impact of dynamism in the network should also be 
examined, as well as more realistic scenarios that include channel artifacts such as fading and 
interference from other nodes. One may be able to improve LPD communication by employing 
nodes that perform friendly jamming. Eventually, we would like to answer this fundamental 
question: is it possible to establish and maintain a "shadow" wireless network in the presence 
of both active and passive wardens? 
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